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Abstract 

A new noniterative approach to determine displacement vector fields 
with discontinuities is described. 

In order to overcome the limitations of current methods, the prob- 
lem is regarded as a general modelling problem. From this point of 
view the imaging field consists of a set of regions with common prop- 
erties and their boundaries. The strategy proposed is an analysis of 
consistency of the displacement estimators between different levels of 
regularization. 

A fully regularized family of model based displacement vector fields 
is constructed by successive smoothing and subsampling. By measuring 
the difference in description length the compatibility between different 
levels of regularization is measured. This gives local but noisy evidence 
of possible model boundaries at multiple scales. 

With the two constraints of continous lines of discontinuities and the 
spatial coincidence assumption consistent boundary evidence is found. 
Based on this combined evidence the model pyramid is updated, now 
describing homogeneous regions with sharp discontinuities. 
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1 Introduction 

The problem of determining discontinuities in displacement vector fields arises in 
both Binocular Stereo and Motion analysis. It has received much attention in 
computer vision research. There are two ways of viewing this problem. It can be 
seen as an "edge" detection problem in a vector field, or it can be seen as a problem 
of finding regions with common properties. Both views are complementary to each 
other. The latter view relates the question of discontinuities in vector fields to 
the old problem of segmentation, with the important difference that regions are 
not defined by the the image property "gray value", but by the object property 
"displacement". In the field of displacement vector field estimation there are, 
however, few attempts to relate this to segmentation approaches. The domain of 
segmentation techniques have been traditionally and still are gray value related 
features, but hardly vector fields as they appear in Stereo or motion. 

The reason for this is the fact that local displacement vector field (DVF) esti- 
mates have large variations in their variance, which is not only due to noise effects, 
but is inherentto the displacement estimation itself, depending on the image geom- 
etry. In "normal" segmentation tasks the variance of a modelled regional property 
is mainly determined by noise, and therefore the task can in principle be solved 
with an adequate noise model. An excellent example of this is described in [Lec89]. 

A number of approaches have been applied to solve the problem of finding 
displacement information and the discontinuities of it. It seems reasonable to 
divide them into three different strategies. 

1.1 Current Approaches 

1.1.1 Discontinuities from Displacement Vector Field 

The first approach separates the problems of the determination of the displace- 
ments and the detection of discontinuities. In the first step it is assumed that 
there are no discontinuities, and a fully regularized solution is determined. Exam- 
ples of this strategy are the "Optical Flow" algorithm [HS81], surface interpolation 
applying the model of the thin plate [Gri81, Ter83], or the membrane model in 
matching [Bro8l], and a series of motion estimation schemes [Nag83, Hil84, Den86, 
NE86, Ana87, TWK87]. The detection of discontinuities follows as a second step 
on the basis of "tensions" in the determined vector field. These can in principle be 
found by significant gradients [DS88] or by the zero-crossings of the second deriva- 
tive of the flow field [TMB85]. It must be stressed that up to now only this type 
of approach for finding a dense displacement vector field has a time complexity 
that allows a real-time implementation when an appropriate control strategy in 
combination with an efficient multi-grid solver is used [SD89]. 

It has been shown that the optical flow allows qualitative statements about 



objects' motion, but that the quantitative optical flow is not a reliable measure of 
the real motion [VP87, SU87]. This applies even more to the derivatives of the flow 
field. It can be expected, and indeed it can be demonstrated, that discontinuities 
determined in this manner deviate considerably from the actual ones in real-world 

sequences. 

Another approach was the segmentation of the optical flow field with a model 
based Hough transform, using the constraints of 3D rigid body motion [Adi85]. 

1.1.2 Simultaneous estimation of Discontinues and Vector Field 

An appealing approach from a theoretical point of view is to minimize a single 
functional, where different processes are combined. Typical elements of this func- 
tional are a similarity function between potentiaUy corresponding image patches, 
the regularization conditions, and a line or edge process, that limits the regular- 
ization and therefore allows "continous discontinuites". Following this approach 
the determination of the displacements and the detection of discontinuities are 

interwoven [KMY86, WW85]. 

The first problem with this approach is that the constructed functional is no 
longer convex, it has local minima that can be far away from the global minimum. 
(Strictly speaking, the functional of the continous approach is not convex either, 
the convexity is enforced by a local approximation of the gray values, and most local 
minima are avoided by a coarse-to-fine control strategy [NE86, Ana87, TWK87, 

DS88]). 

A more severe problem is the open question of how to weigh the different parts 
of the functional. Typically these coupling constants are determined heuristically. 
It has been shown that the optimal choice depends on the image content, and can 
vary within an image [Yui87]. In the extreme case so many free parameters are 
introduced that the entire approach is led ad absurdum, since its elegance is based 
upon the idea of reducing a complex problem to a simple principle. The approach 
can only be succesful when these parameters can be found from apriori principles 
and the image data themselves. Whether the proposed min-max principle [GY89], 
where the functional is maximized with respect to the coupling parameters and 
minimized with respect to the displacements, is adequate for real image sequences, 
is an open question, especially with space-variant parameters. Also, the computa- 
tional costs are immense, because for every configuration of parameters the entire 
minimization problem has to be solved. 

1.1.3 Discontinuities without full Displacement Vector Field 

This approach also separates the two problems, the discontinuities are, however, 
detected first. When this segmentation problem is solved, then the computation 
of the displacements becomes simple, often just a least squares estimation or the 



minimization of a simple quadratic functional. This se S mentatl °* **^° * J 5 "" 
condition to the direct determination of 3D structure or motion LTH84, NH85J. 
It may seem to be an impossible task to estimate the discontinuities before the 
vector field itself, but there is already encouraging evidence that such informa- 
tion can indeed be found. One approach in this direction is the analysis of local 
histograms of possible displacements and the search for "vanishing bars lbU87j, 
and another is the detection of significant non-vanishing divergence of the local 
non-iteratively determined displacement vector field [NA89]. While this type of 
approach is appealing due to its noniterative character, these results are at best 
preliminary, and a precise analysis of their reliability is missing. These approaches 
indicate, however, that it is indeed possible to extract the important information 
about discontinuities at a very early stage of the vision process. 

1.2 Observations which motivate a new scheme 

For the (Welopement of a new scheme to the detection of displacement disconti- 
nuities and the segmentation of motion fields the following observations will be of 
importance: 

• The vector field determined by a regularization approach such as the "optical 
flow" or a gaussian weighted least squares fit is more or less correct in the 
interior of a displaced region, apart from the distorting effects discussed in 
[VP87]. The size of the region determines how much smoothing of the vector 
field is appropriate. Within these limits the noise is better suppressed and 
thus the vector field is more reliable, if more smoothing is applied. The vector 
field changes dramatically, however, when the smoothing is too strong. 

• In the interior of a region the measured local data are compatible with the 
regularized vector field. The only deviations are due to noise. 

• Near the occlusion boundaries the estimates of the regularized fields devi- 
ate significantly from the actual displacement. Near occlusion boundaries 
a less regularized vector field reflects the actual displacement better than a 
smoother vector field, while at the same time it is more sensitive to noise. In 
particular the measured data themselves are significantly incompatible with 
the regularized vector field. The consequence of this is that close to occlu- 
sion boundaries essentially all members of a family of regularized solutions 
are incompatible with each other. 

• An observation made by David Marr is that discontinuities of any given 
dimension are relatively sparse and they are continuous in the next lower 
dimension [Mar82]. This means that discontinuites of 2D regions are essen- 
tially continous lines. These lines can have again relatively few directional 



discontinuities in the form of endstops and breakpoints. These are, however, 
not considered in this paper, but a smooth and closed occlusion boundary is 
assumed. 

• If there is an actual boundary, it shows up at multiple scales. This was 
formulated by David Marr as the spatial coincidence assumption [Mar82]. 

• The detection of boundaries due to motion alone is instantaneous for the 
human visual system. This can be easily found out with a moving random 
dot pattern. Therefore there must be a fast mechanism with (at most) a few 
iterations to find these discontinuities. 

As a consequence of these observations the following strategy is followed. 

1.3 General strategy 

A t the beginning of the process a family of regularized displacement vector fields is 
determined. According to [VP87] it is not critical which particular regulanzation 
method is applied. Therefore a computationally cheap noniterative approach of lo- 
cal weighted least squares estimates is followed. With this approach the calculation 
of the family of regularized vector field can be decomposed into the construction of 
a kind of Gaussian pyramid of moments and a pointwise calculation at each node of 
the pyramid. With this procedure not only the actual displacement estimates but 
also the sum of squared deviations can be determined. Each level of this pyramid 
represents one level of regularization, with the measured data at the bottom, and 
the global average displacement at the top. 

The most critical part of the procedure is to calculate the compatibilites be- 
tween the different levels of the pyramid. It will be shown that the traditional 
Euclidian distance between properties corresponds to the difference of the sums of 
squared deviations, when only mean values are estimated. For comparing regres- 
sion coefficients as in the case of displacement information euclidian distances are 

not suitable. 

The difference of sums of squared deviations with respect to the displacement 
model is a candidate for a general compatibility measure, with the nice properties of 
a metric. There are two drawbacks, however. First there is no inherent interpreta- 
tion in this measure, which also means that for quantitative use a heuristic scaling 
parameter needs to be introduced. This occurs, wherever the sums of squared 
deviation or euclidian distances are used as compatibility measures. Secondly it is 
impossible to use the sums of squared deviations for comparing different models 
which all explain the data. In motion interpretation this could, for example, be 
pure translation versus translation with rotation and scaling. The sums of squared 
deviations is a monotonously decreasing function of the number of parameters, and 
it vanishes when there are as many parameters as observations. 



In order to overcome these two drawbacks, the criterion of minimal description 
length (MDL) can be used as a compatibility measure to compare displacement 
models at different levels of the pyramid. Although this compatibility measure is 
not a metric, it has a probabilistic interpretation and it can be used to compare 
models of different complexity. Even when only a single model is used, compati- 
bility defined by the MDL criterion is less sensitive to structural variations in the 
images than the other measures. 

The noisy evidence of each individual scale leads to the problem of combining 
evidence from multiple locations and scales. Due to the weak contrast of local 
measurements of displacement discontinuities, the "continuity of discontinuities 
and the "spatial coincidence assumption" [Mar82] serve as guiding principles, al- 
lowing displacement discontinuities only if there is combined evidence from the 
neighborhood and from several scales. 

A pyramidal structure with the local links between its levels opens the possi- 
bility of constructing a model that makes use of these constraints. 

1.4 Outline of the paper 

The next section discusses the minimum description length approach, in particular 
with its application to the regression problem. 

In section 3 the question of compatibility of two sets of data with respect to a 
given model is discussed. The displacement estimation problem is described as a 
regional regression problem. Three different compatibility measures are presented. 

Section 4 describes the actual scheme, where the individual steps are described 
and illustrated with an example of a random dot stereogram. At the end the result 
of applying the algorithm to a real-world image pair from a sequence is shown. 

Section 5 concludes the paper with a discussion of the approach and an outlook 
to possible extensions. 

2 The Minimum Description Length Approach 

The extension of the widely applied maximum likelihood principle is motivated by 
the question what an overall best model for a given set of data is. In order to 
have a clear definition of what a "best" model is, not only the goodness of fit of a 
model, but also the complexity of the model itself have to be considered. In this 
sense, the "best" model is the simplest that can explain a given set of data. While 
this as a general statement has been known at least since "Occam's Razor" and 
has been a philosophical guideline in the developement of the natural sciences, it 
is far from trivial to formalize this principle in such a way that it is applicable to 
determine a specific model. 



2.1 General background of minimum description length 

The foundations of a formalization were laid in information theory, by the intro- 
duction of an algorithmic definition of information [Sol64, Kol65, Cha66], which 
defines the information content of a string by the minimal length of a program to 
describe this string. A limitation to the application of this idea was already set 
by one of its inventors, stating, that no program can automatically generate such 
a program or description of minimal length [Kol65]. This essentially rules out a 
fully automated process of general induction. It does not rule out, however, the 
testing of any number of models— invented by human ingenuity— by this criterion 
This latter approach mirrors the philosophy of critical rationalism widely adopted 

in the field of science [Pop59]. 

The practical applications and further theoretical justifications of this concept 
were developed independently by C.S. Wallace and his coworkers [WB68, BW73, 
WB75, GW84, WF87] and by J. Rissanen [Ris76, Ris78, HR82, Ris83, Ris86, 
Ris87l' Both envisage primarily the statistical problem, how to choose an optimal 
set of parameters within a given model class. The length of the description of 
the parameters is then interpreted as the negative logarithm of a Bayesian prior, 
and the best model is found by maximizing the product of conditional and prior 
probabilities. Putting it another way, the optimal model is selected by minimizing 
the sum of the negative binary loglikelihood of the data given the model, and the 
length of the shortest code to describe the parameters. Although the approach 
appears to be formally Bayesian, there are fundamental differences. The most 
prominent difference is that the relevant distribution for the MDL concept-the 
"marginal prior distribution" [WF87] or the "stochastic complexity" [Ris87j-is a 
distribution on the data, and not on a model. 

One key question to make the approach feasible is, how to encode and there- 
fore how to optimally truncate the— typically real- valued— parameters. The an- 
swer was found by both researchers independently, although the resulting formulas 
look rather different, and appear to be partly inconsistent. The parameters are 
truncated in such a way, that the truncation error equals the statistical estimation 
error of the model (see e.g. [Ris83, WF87]). 

2.2 Minimum Description Length of Regression 

One important motivation to apply the principle of Minimum Description Length 
to the regression problem is related to the question of the best choice of regressor 
variables. Specifically when modelling a ID signal or a 2D surface with piecewise 
polynomials naturally one has to somehow determine the optimal degree of the 
elements. Interweaved with this is the problem of finding the optimal breakpoints 
in ID or the region boundaries in 2D. Both problems can be formulated in terms 
of a descriptive language. By minimizing the length of the code in this language 



the optimal description is found. The gray value segmentation problem has been 
treated in this spirit by Y. Leclerc [Lec89]. 

The main emphasis here is laid on measuring the statistical compatibility of 
two sets of data with respect to a given model. This also means to find the 
optimal degree of a polynomial. The partitioning problem is here approached m 
a way that is based on fundamental insights about discontinuities in the visual 
input data, thereby avoiding the difficult and time-consuming minimization of 
functional that include explicit boundary models. 

2.2.1 Selection of variables 

Estimating the best degree of a polynomial for a given set of data points is a spe- 
cial case of the general selection of variables problem. The model likelihood with 
the maximum likelihood estimates of the regressor variables naturally increases 
with the number of variables used. Therefore the otherwise powerful maximum 
likelihood principle fails to give an answer to the important question, which vari- 
ables explain the observed data best. The description length approach extends 
the maximum likelihood principle by including a term that measures the model 
complexity. 

2.2.2 A simple measure of model complexity 

An intuitive approach to capture the notion of complexity is the following [Foe89]. 
When the variables explain the data optimally, then the residual variance is entirely 
due to noise. Assuming Gaussian noise, the standard deviation of the noise is 
proportional to V™, with n the number of observations. A "good" variable, which 
"explains" the data, should therefore reduce the residual standard deviation by 
more than y/n. Thus a good term to express the cost of k parameters is 

2 lo S2 n 

The coding length interpretation of this term is that in the presence of Gaussian 
noise for the required precision each parameter needs to be encoded with ±log 2 n 
bit. The complete description length L{x\9) for a model with the parameter vector 
6 and the n observations x therefore is 

k 

L(x\9) = -log 2 p{x\e) + -\og 2 n (1) 

This criterion has been derived by J. Rissanen [Ris78] and independently by G. 
Schwarz [Sch78], who derived it for the exponential family of distributions without 
referring to description length. fc 

Despite its apparent simplicity, for large enough n the term 2 log 2 n can be 
proven to represent the asymptotically optimal model complexity [Ris89]. 



2.2.3 Minimum description length for multiple regression 

For a finite number of observations, however, another approximation of the de- 
scription length is needed. This criterion has been independently derived by C.S. 
Wallace [WF87] and J. Rissanen [Ris87]. Let the general linear model of multiple 
regression be of the form 

i=i 

which is in matrix notation 

p = G-9 ( 3 ) 

where G represents the set of regressor variables and 9 is the parameter vector to 
be determined. The n measurements x { are assumed to be independent samples 
from a normal distribution 

The least squares estimator 9* of the parameter vector 9 is determined by mini- 
mization of the sum of squared deviations S with respect to 9: 

S = (x - tf(x - /») (5) 

9* = (G T G)- 1 G T x (6) 

Therefore the optimized sum of squared deviations S* is 

S * = /, _ 0* T G T G9* (7) 

and the probability at the least squares estimate 9* is 

n 

According to both authors the generalized description length becomes 

L(x) = J log 2 2*e + £ log 2 ^ + \ log 2 \G T G\ (9) 

The first two terms are the negative log of the probability in equation 8. The last 
term, the log of the determinant of the curvature of the log-likelihood function 
at its maximum, represents the complexity of the parameters. The interpretation 
in terms of description length is, that the curvature of the negative log-likelihood 
function provides a yardstick by which the precision of the parameters is deter- 
mined. The parameters are considered to be of optimal precision, when the error 
caused by truncation of the parameters is the same as the statistical error of the 
measurement. 

9 



It needs to be mentioned that this formulation of the description length is not 
invariant with respect to the scale of the data. It also becomes undefined, when 
the so called design matrix G T G becomes singular, which can easily happen m a 
local analysis of image properties. In the framework of "Stochastic Complexity 
there is, however, a formulation of the description length, which is always defined 
and which is completely scale invariant [Ris89]. It is briefly described in Appendix 
D. From there it can be seen that equation 9 is an approximation to the exact 
description length. 

3 A model based homogenity criterion 

Here a criterion of homogenity is developed that is based on the minimal descrip- 
tion length of data with respect to a model. It is shown to be a logical extension of 
euclidian distance used in gray value segmentation and the sum of squared devia- 
tions applied as a merging criterion in the context of hierarchical Cluster Analysis 
and traditional Split-and Merge algorithms. 

In order to show the correspondence to existing models for gray value segmen- 
tation, the simple case of the ordinary gray value segmentation problem is analysed 
first with a model based view. Then the ID displacement estimation with a locally 
constant vector field is discussed. It will be shown that this extends easily to 2D 
and to more general and complex models. 

3.1 Homogenity for gray value segmentation 

The model of the image 7 is a set of regions R k with constant gray values within 
the regions, each region consisting of n k pixels with the gray value x { . With a 
Gaussian noise model the mean values /x fe and the variances a\ are a sufficient 
statistics to describe each region: 
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The parameters of interest, the estimated mean value ^ and the sums of squared 
deviations S k are determined: 

A = —•£« (11) 

n k i<R k 

s k = E(*-rt) 2 < 12) 

itRk 
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ieR k 

From the definition of S k it is obvious that it grows monotonously with each 
additional x 5 that is added to the region. Therefore it is possible to define a 
feature distance measure between a region R k and an xj as the increase of S k 
when xj is included in R k . This generahzes to the distance between two regions 

R k and Rr. , , 

D(k,l) = S kul -S k -S l (13) 

where S kul are the sum of squared deviations, when R k and ft are joined. Another 
view of this distance measure is the decrease of S kut when a member or a subregion, 
say R k , of region R ku i is taken out of that region. Trivially the sum of squared 
deviations of a single item is 0. 

With the simple model discussed here, the S k don't have to be calculated 
explicitly in order to determine the distance measure. As can be easily shown, the 
distance D{k,l) between two regions R k and Ri becomes 

D(M ) = iji-. W - ft -)* (14) 

n k + ni 

This relates the introduced distance measure to the traditional Euclidian dis- 
tance between mean values [Pav82], the only difference being the weighting term, 
which depends on the size of the regions involved. 

3.2 Homogenity of displacement vector fields 

The homogenity measure is now generalized to stereo and motion models. The first 
model is a regionally constant displacement u k and the ID "images" are locally 
approximated with first order polynomials. The intuitive understanding of ^this 
model is that the intensity differences between the two images are "explained" by 
the displacement, when a locally planar image model is assumed. The expression 
to be minimized with respect to u k is (See details in Appendix A) 

s fc = _5> f (fc - 1* -9if ( 15 ) 

UR k 

with hi the regularized intensity difference at location * and # the regularized 
estimate of the local gradient of the average of both images at location i. The 
Wi are the weights of the different locations within the support R k . The weighted 
least squares estimate for u k is 

HuR k Wjhjgj ^ 16 ^ 

EieR k w i9i 



ul 



11 



Therefore the sum of squared deviations of region k at the optimal estimated value 
u\ becomes 

Sk = Y, v>i{hi ~ u l ■ 9if 

ieR k 

= Ewi ^!^ (17) 

With this residual sum of squares the same kind of distance measure D(k, I) as in 
equation 13 can be generated. 

Although the sum of squared deviations is formally suitable as a compatibility 
measure, it has no obvious interpretation nor any invariance properties. In partic- 
ular it is a monotonously decreasing function of the number of parameters, and it 
vanishes when there are as many parameters as data points. 

A better measure is the negative log-likelihood at the maximum likelihood 
estimate 

Mk = n k log &) (18) 

This has a probability interpretation for a given model, and in the context of the 
Neyman-Pearson test it has been successfuUy used for gray value region segmen- 
tation [Yak75]. 

The likelihood also does not take into account the complexity of a model, 
however. Therefore by choosing a sufficiently complex model the likelihood can be 
made as large as needed. Only the minimum description length makes it possible 
to compare models of different complexity in a consistent way. 

As described in the last section, the part of the description length of a region 
that is not related to the geometry, is essentially 

™* = ?•"*.(£) + 3 • k * lftl (W) 

where C k is the design matrix of the statistical model. In the simple ID flow model 

it is EieR k 9i- , x . , 

The actual compatibility calculation is very simple. Suppose there are two 
neighbouring regions R k and R t . When an item or a region j that is not yet part 
of R k is inserted into R k , then the statistical part of the description length changes 

by the amount 

ADL(k,j) = DL kuj - DL k - DLj (20) 

If region j is to be compared with different regions, then the compatibility measure 
can be simplified due to the constant contribution of region j 

D(j,k) = DL kuj -DL k (21) 

12 



This "pseudo-distance" (it is not symmetric and may well be negative) can now 
be converted into a relative assignment probability Q{j, k) to a region A; of a set 

K ' 2- D{j ' k) e>?\ 

This obviously does not include the probability, that j may not belong to any of 
the possible groups. 

4 Description of the complete scheme 
4.1 The pyramidal architecture of the scheme 

A key to the whole concept is a pyramidal architecture, which can represent both 
a family of regularized model estimates as well as a partition of the image with 
piecewise homogeneous regions and sharp discontinuities between them. This ar- 
chitecture has first been used for gray value segmentation and the applied algo- 
rithm has become known as pyramidal linking [BHR81, PR81, Bur84]. Although 
the pyramidal linking algorithm applied there has proven to be inadequate for 
displacement vector field segmentation, the underlying pyramidal architecture will 
be used to develop a new scheme. 

The architecture of this pyramid can best be explained in one dimension. The 
extension to two (or more) dimensions can be done in a straightforward way by 
applying the ID concept in each dimension. Fig. 1 shows the graph of this pyramid 
with 16 nodes at the bottom level, representing the primary image data. 

The key property of this pyramid is that each node of a lower level (= son node) 
is connected to two nodes (4 in 2D) at the next higher level (= father nodes), except 
at the image boundary, where each son node is connected to only one father node 
Each father node has links to 4 son nodes (16 in 2D), which define the maximal 
support for any estimation at the father node. The links are represented by weight 
factors ti>(t, j) between and 1, linking son node i to father node j. 

Through these links information can be propagated in both directions. The 
bottom-up propagation of signal properties Jf from pyramid level (to) to level 
(ra + 1) is by weighted summation: 



■ 3 

t€son(j) 



The top-down propagation from level (to + 1) to level (to) is analogous: 



/*»>= £ «,(i,j)PJ m+1) (24) 

jefather(i) 
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Figure 1: Architecture of the pyramid structure used in ID 

For normalization purposes, the sum of the weight factors is constrained 

£ w(i,j) = l Vj (25) 

i€son(j) 

This means that each son node contributes with the total weight of 1 to the next 
level of the pyramid. When links to father nodes are cut completely in order to 
define a region, then obviously all weights are 0. This aspect will only be dealt 
with in such way here, that a maximum pyramid level is chosen a priori. All nodes 
at this level are assumed to define regions. 

Both these processes are in general space- variant linear operators, and they 
can be interpreted in such a way, that each father node represents a region of 
those son nodes with non-zero weight factors. In the general case these regions are 

overlapping. 

With the necessary properties propagated up, a weighted least squares estimate 
can be obtained for the region corresponding to a father node. In the simplest 
case of a model with regionally constant gray values, the grayvalues themselves 
and the number of contributing pixels (one at the bottom level) are propagated 
up according to equation 23, recursively from level to level, up to the top level 
of the pyramid, which consists of only one node. At each node of this pyramid 
a weighted mean value can be estimated, which corresponds to the image region 
that has non-zero links to this node. 

When each father node has the same set of binomially distributed weights, then 
a kind of Gaussian pyramid is created. When the weights are strictly or 1, then 
the image is segmented into non-overlapping regions, and the weights define the 

14 




Figure 2: Graph of a complete segmentation in ID 

partition. . 

The top-down propagation process can best be understood when the image is 
fully segmented, so that each son node has exactly one father node, to which it 
is linked with a non-zero weight. The corresponding graph is a tree, indicated in 

Fig. 2. 

In this case the top-down propagation is particularly simple. Starting from 
the root nodes of the tree, which represent the region properties, according to 
equation 24 each son node inherits these properties from its father node. Therefore 
the properties of the whole region are distributed to all contributing pixels, which 
is, for example, relevant to show the result of a segmentation. 

The other extreme is the situation with those space-invariant weights, by which 
the Gaussian pyramid was created. Then the top-down propagation from a certain 
level of the pyramid effectively means a smoothing operation of the properties that 
were propagated up from the bottom level. The higher the referred level of the 
pyramid, the larger the effective filter size. 

The segmentation task can be formulated within this pyramidal architecture. 
It is assumed that the image consists of reasonably large coherent regions. Within 
these regions the properties are assumed to be homogeneous. As a consequence 
the linking weights in the interior of regions can be assumed to be arbitrarily 
distributed as long as the corresponding father nodes also belong to the same 
region. In particular the weights can be binomially distributed with respect to 
a particular father node, which is the initial weight configuration of the scheme. 
The critical parts of the image are its discontinuities. Discontinuities are locations 
where the linking weights from neighboring son nodes should be non-zero only 
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Figure 3: Random dot stereo pair 

in the direction pointing away from the discontinuity towards the interior of the 
region. This applies even in the case when the region is represented by just one 
father node. On the other hand, if evidence can be found that the links from two 
neighboring son nodes should point in opposite directions, a local hypothesis of 
a discontinuity can be made. The segmentation task thus consists of gathering 
evidence about potential discontinuites by analysing the initial pyramid which 
has space-invariant binomially distributed linking weights, making this evidence 
consistent within and between the pyramid levels, and finally changing the weights 
in such a way that a segmented tree or at least something close to it results. 

For the purpose of illustration an example of a random dot stereo pair will be 
used to show the different stages of the method. The original pair is displayed in 
Fig 3. In the center of the second image a square is displaced by 2 pixels. Both 
images also contain random noise. 

4.2 Initialization 

At the beginning of the process a family of regularized displacement estimates is 

constructed. 

The basic procedure is demonstrated with the simple model of locally constant 
ID displacements. This model is described above in section 3.2 (see also Appendix 
A) As with any least squares model, the estimator as well as the minimal sum 
of squared deviations can be calculated by summing the required moments over 
the region of interest and performing a few scalar operations. Given the set ot 
moments at each pixel any degree of regularization can be achieved by smoothing 
all the moments before calculating the desired properties with scalar operations at 
each level of smoothness. Therefore by creating the Gaussian pyramid described 
above with all the moments a family of increasingly regularized vector fields is 
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available. . (o) • ^ 

Taking the local model of constant ID flow in an image with size rt pixels 
(for simplicity only one index is used here) from equations 16 and 17 it follows 
that for u* k and for S k the following four moments are required: 




This means that the elements 

^ = {1,*?,**,*?} (i = l.-.n (0) ) ( 26 ) 

form the elements of the bottom level of the pyramid. 

At the initialization stage a family of fully regularized solutions to the dis- 
placement estimation problem is created. This means that the moments of the 
successive levels of the pyramid are formed by bottom-up propagation of the mo- 
ments of their 16 son nodes at the level below (see equation 23). These are the 4 
closest nodes in each dimension. The weights w{i,j) define the support over which 
the least squares model is estimated at level j. The weighting factors are such 
that each son node distributes its information among its 4 father nodes according 
to equation 25. This guarantees that the total sum of contributing data elements 
is constant at each level of the pyramid. The initial weights of the 16 son nodes 
are binomially distributed in each dimension, which is the optimal 4 X 4-pixel ap- 
proximation to a 2D Gaussian distribution. This optimizes the localization of the 
model properties at the coarser levels of the pyramid. 

Fig 4 shows the family of regularized displacement fields. As there is only a one 
dimensional displacement, the scalar fields are gray level encoded. The different 
levels of the pyramid show displacement fields based on binomially weighted least 
squares with an increasing support. 

4.3 Measurement of compatibility 

Due to the way the pyramid is constructed it can be interpreted as a graph, where 
each son node is connected to 4 father nodes (in 2D) at the next higher level of the 
pyramid, which represent 4 directions. This representation has been used before 
for gray value and texture segmentation [BHR81, PR81, Bur84], but the algorithm 
described there is not adequate for the segmentation of displacement vector fields. 
By measuring the compatibility to all 4 father nodes, there are three possible 
outcomes: 

1. There is approximately the same compatibility to all father nodes, possibly 
with minor variations. This means that the son node is in the interior of a 
homogeneous region. 
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Figure 4: Family of regularised displacement estimates 
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2. There is a significant preference towards a certain direction. This is an indi- 
cation that the son node is close to a displacement discontinuity, especially 
when the corresponding neighbor node shows a preference in the other di- 
rection. Significance in this context has the interpretation that the creation 
of a piece of occlusion boundary is "cheaper" in terms of coding length than 
the enforcement of the model across this boundary. 

3. There is total incompatibility to any of the father nodes. This is an indication 
that at the pyramid level above there is too much smoothing with respect 
to this particular son node. Therefore this node (with all nodes "below" it) 
represents an independent region. 

Although the last case is of conceptual interest and is important when an actual 
labelling of regions is performed, it can in many cases be ignored when the pyramid 
is cut at a level that contains more nodes than the expected number of distinct 
regions. This typically is a level of 4 - 12 pixels square. At the "computationally 
expensive" levels of the pyramid only the first two alternatives are relevant. 

The compatibility D{i,j) betweeen a node i and its father node j is measured 
by the difference of description length between node j with the content oft included 
and node j with the content of node * removed. Assuming that i contributes with 
the weight w(i,j) to j before the measurement, then in analogy to equation 21 
D(i,j) is given by 

D{i,j) = DL{P^ + (1 - w(i, J))PM) " DL(Pt +1) ~ Hi,m (m) ) (27) 

The description length has an interpretation as the negative log of a probability, 

so D{i,j) can be converted to a normalized assignment probability as in equation 22 

Q(iJ) = y-——-^mj) (28) 

2-,j' =father(i) ^ 

The denominator makes sure, that the total assignment probability of i to the next 

level is exactly 1. 

The introduction of j3 becomes necessary because it is desirable to have a 
similar range of assignment probabilities at each level. This is achieved, when 
f3 is the inverse of the number of pixels contributing to the father node j. The 
interpretation of this choice is that the "per pixel description length" is measured. 
As the population in the pyramid increases by a factor 4 with each level, the choice 
for /? is 4" m , where m is the pyramid level (The bottom level of the pyramid is 0). 

4.4 Regularization of boundary evidence 

After the measurement of compatibilities between the pyramid levels at each node 
a set of 4 probabilities is given, measuring the local preference of each node to- 
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Figure 5: Measured assignment probability field at 2nd level 

wards one of 4 directions. These 4 probabilities are related to the 4 directions 
NW,NE,SW,SE. They weigh the directional preference of each node in the pyra- 
mid.' This quadruple of directional preference can be transformed into a vector 
field which directly shows the directional preference of a node and the strength of 
this preference. Associating the S-N direction with the y-axis, and the W-E direc- 
tion with the x-axis, the vector field components (F x , F y ) for directional preference 
are determined by 



(Qne + Qse) - {Qnw + Qsw) 
(Qne + Qnw) ~ {Qse + Qsw) 



(29) 
(30) 



The vector field has only 2 components, whereas the quadruple representation 
has 3 (the normalization determines the 4th). This is due to the possibility that 
the quadruple could represent preferences that are inconsistent with a directional 
interpretation. These are, however not considered as relevant. Therefore it is 
possible to convert both representations into each other. 

Fig. 5 shows this vector field at the second level of the pyramid. When the 
vector field is observed, it can be easily seen that it is very noisy in comparison 
to the ideal situation, where there should only be nonzero vectors on both sides of 
the displacement discontinuities. 

The discontinuities appear as sinks in this vector field. Therefore a good oper- 
ator to measure the strength of discontinuities is the divergence of this field. The 
lines of discontinuity are to be found where the negative valleys of the divergence 
of the directional preference field F are. The negative part of the divergence of the 
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Figure 6: Measured boundary evidence at all levels 

field at all levels is shown in Figure 6, indicating the measured boundary evidence. 
The sources of the field have no such meaningful interpretation, so for clarity the 
positive part of the divergence is set to 0. 

As can be seen from this figure, at the lower levels of the pyramid the assignment 
probability vector field is very distorted. By sufficiently smoothing the vector field, 
however, the errors are averaged out, but the field preserves its qualitative essential 
properties. This can be seen at Fig 7, where the vector field has been smoothed 
with a 5 x 5 Gaussian filter. The different levels of the pyramid are smoothed 
differently. Obviously the lowest level has to be smoothed with the largest filter 
mask. The sizes of these masks have been determined heuristically. 

Smoothing of the probability quadruples has the same effect. While the local 
directional contrast decreases the vector field becomes more coherent. The location 
of the sinks is also preserved, unless there is another sink nearby. In that case the 
two sinks will be attracted and with more smoothing they will merge into one. 

This means that by smoothing the probability field, smooth and stable bound- 
ary evidence can be found. A boundary's location will be correct, if there is no 
other boundary close by. In other words, the maximal smoothing area should be 
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Figure 7: Boundary evidence from smoothed probability field 
smaller than the smallest expected distance to another boundary. 

4.5 The spatial coincidence constraint 

An actual discontinuity should show up at the same location at multiple scales. 
This essentially is the spatial coincidence assumption formulated by David Marr 

[Mar82]. , ... . 

This constrains the possible discontinuities to those that coincide at multiple 
scales. In the proposed approach the spatial coincidence constraint is used to 
combine the evidence of several levels of the pyramid. 

Although the smoothing at a single level improves the coherence of the prob- 
ability field considerably, there are still undesirable fluctuations, especially at the 
highest level of resolution. This is a natural consequence of constructing local 
estimates from only few measured data points. 

By summing over the probabilities of several pyramid levels, the fluctuations 
at locations of no discontinuity can be expected to average out. Where there 
are discontinuities, however, the probabilities will reinforce each other in their 

directional preference. 

In order to relate the probabilities of different levels, the coarse level probabihty 
quadruples must be projected down one or more pyramid levels according to the 
formula, which is analogous to equation 24: 

Q {m) (iJ)= E «,(i,i)G (m+1) (',i) Vi ' J ' (31) 

iefather(i) 

This is optimally done with the binomially distributed weights w(i,j) that were 
used to create the initial pyramid. 

Fig. 8 shows the downprojected divergence of the field shown in Fig. 7. This 
shows that the downprojection effectively performs another smoothing step. 

When the smoothed probability quadruples of the first three levels are added in 
the above-described way, the resulting field has the divergence shown at the bottom 
of the pyramid display in Fig. 9. The two intermediate levels are constructed by 
collecting evidence from the levels themselves and from one level above. The fourth 
level only uses its own evidence. 
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Figure 8: Boundary evidence from downprojected probability field 




Figure 9: Final boundary evidence at all pyramid levels 
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Figure 10: Final estimate of displacement field 

4.6 Constructing the final model 

As the probability quadruples express a directional preference, the coherent prob- 
ability field determined by the regularization step and the spatial coincidence con- 
straint can now serve as a new set of new weights that define the properties of each 

nodes father nodes. 

Due to the smoothing the "contrast" of the probabilities has decreased, so that 
they can't be used as new Unking weight factors directly. Their qualitative trend, 
however, is correct in the sense, that the directional preference goes to opposing 
directions at the hypothetical lines of discontinuity. The conversion of this trend 
to actual linking weight factors is done up to now in a very heuristic way. The final 
normalized probability quadrupels are taken to a high power, say 20, and the result 
is normalized again. This greatly enhances the "contrast" of the probabilities. 

With these weights a new pyramid is created according to the bottom-up prop- 
agation scheme of equation 23, up to the pre-chosen top level of the pyramid. The 
moments of this level are then again propagated down to the bottom level. When 
the model is actually determined at the bottom level, the result can be visual- 
ized and used for further computations. Fig. 10 shows this final estimation of the 
random dot example. Although this is not yet a segmentation in the strict sense, 
it is obvious, that the displacements have a clear discontinuity around the black 
square. 

4.7 A real example 

Finally the method is tested on a real image sequence. The two frames used 
from this sequence are displayed at Fig. 11. One can see that the can is on a 
table. Although this is an example of 2D flow, the model of which is described in 
Appendix B, the vertical component of the displacement is so small that it can be 
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Figure 11: Pair from real image sequence 

neglected. Therefore the horizontal component can be displayed the same way as 
in the example before. Obviously a model of regionally constant flow will not be 
able to capture the field of the table motion. The necessary extension is to use 
a planar model for the displacement as described in Appendix C. The result of 
Fig. 12, which is based on the regionally constant model, shows clearly, however, 
the discontinous motion of the can in comparison to the background, except at the 
top right corner of the can, where the contrast between object and background is 
so low, that a good estimate is just not possible. 

5 Conclusions and discussion 

A new approach has been presented for the fast determination of discontinues 
displacement vector fields, which is closely related to the problem of finding a 
segmentation of the vector field. The last step of the actual segmentation has 
not yet been taken, this will be a natural follow-up project with its own specific 

problems. 

The approach is based on regional models of the vector field, and it uses gray 
level data as primary input, either the original gray value images or appropriately 

filtered images. 

The boundaries and the discontinous displacement vector field can be found 
within a single step, no iterations of any procedure are required. The elegance 
of the scheme is that boundaries are found instantaneously by demanding consis- 
tency of local model compatibility measures within a neighborhood and at multiple 
scales. In that way the "continuity of discontinuities" and the "spatial coincidence 
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Figure 12: Resulting estimate of ID displacement field 

constraint" are made practical. 

The current constraint on the boundary is simply its smoothness, This is sim- 
ilar to the chain-code boundary model as it is common in Markov Random Field 
models. Obviously there are more efficient boundary models such as polygons, 
quadratic curves or Bezier curves. Those will most probably result in a shorter 
code and will therefore allow more easily the segmentation of extremely thin ob- 
jects, which will hardly be found by the current model. 

At this point only very simple models have actually been tried. Although the 
description length criterion allows the comparison of different models, the current 
implementation will need to be extended to allow this comparison. In particular 
the planar and the second order models will have to be tested with real stereo and 
motion data. One aspect to consider is the increased storage requirement, when a 
more complex model is applied. 

The method is not limited to segmentation based on displacement vector fields, 
but can be applied with any model that can be set up as a regional least squares 
problem. Therefore the same method also might be used, for example, for texture 
based segmentation. 

The fact that the information about discontinuities is represented as a probabil- 
ity field allows a relatively easy integration of different vision modules. The same 
way as the probabilities from different scales, those from different vision modules 
can be integrated. There is no inherent problem of coupling constants, because 
the probabilities are all normalized. 
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Appendix 
A Estimation of locally constant ID flow 

The idea behind this model is that all intensity differences between image I^'(x) 
and image I^ 2 '(x) are caused by the constant displacement u. In order to minimize 
the approximation error, both images are treated in a symmetric way. Therefore 
the following integral has to be minimized with respect to u: 

jw(x)(&\z-\)-&Xz + \tfdx (32) 

The weighting function w(x) defines the support of the functional and the weights 
of the individual measurements within this support. By a local linear expansion 
the I^ k '(x) are approximated: 

/<*>(* + u) * I ( k \x) + u ■ /«(*) (33) 

Now it is possible to restrict x to the grid coordinates i. With 

in = i^\i) - 4 2 \i) 

and 

the integral to be minimized is transformed into a sum over the discrete support 
R and the weights w^, the sum of squared deviations Sr: 

SR = ^Wi{hi-u-gif (34) 

ieR 



This has the least squares solution 

Y,uR w i h i9i 



u = 



and the minimal sum of squared deviations S R 



(35) 



ieR EieR Wi9i 

= $>a 2 -«*(E^ 2 K (37) 

ieR ieR 
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B Estimation of locally constant 2D flow 

The derivation is very similar to the ID case, with the difference that now the 
displacement has the two components u and v and the gray value gradients have 
the components g x i and g y i. The weighted sum of squared deviations becomes 

Sr = Yl w i ( h i - u • 9xi - v ■ g yi ) 2 (38) 

ieR 



(39) 



with the least squares solution 

( U * \ = ( ^ Wi9xi ^ w i9xi9yi \ f £i WiKQxi \ 
V V * ) V T,iWi9xi9yi iliWiSfa ) \ T,iWihig y i ) 

and the minimal sum of squared deviations 

S'n = £ «**? - K O ( ^ Wi9li E < Wi9x 1 vi )(K) (40) 

C Estimation of planar ID flow in a 2D image 

This is the planar surface model relevant for stereo, when the images are given in 
normal coordinates and the epipolar line is parallel to the x-axis. The displacement 
model now depends on the image coordinates (a;;,t/;) 

u( x i-> Vi) = u o + «*« + hi (41) 

The weighted sum of squared deviations Sr becomes 

S R = £ W i ( h i ~ ( U + aX i + hi)9if (42) 

and the least squares estimates are 

f EiWigf EiWigfxi EiWigfyi \ I E;W;^5i \ 
= J2i w i9l x i Ei^gfxj EiWigfxiVi £;Wi^5;X; (43) 

\ T,itoi9iy% EiWigfxiyi EiWigfyf I \ ^Wikgiyi ) 

The minimal sum of squared deviations is analogous to the previous cases. 

D Stochastic Complexity and regression 

Unfortunately the criterion described in equation 9 is not invariant with respect 
to scale change of the the data. If both the observations as well as the regressor 
variables are multiplied with a factor s, then the additional term 

(fc + n)log 2 .s 
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appears, although the parameters are identical in both cases. This can influence 
the decision, which set of regressor variables is optimal, which is certainly not 
desirable. 

J. Rissanen has introduced the concept of "Stochastic Complexity", which elim- 
inates all possible redundancy of a coding system [Ris87, Ris89]. For a model class 
Mk it is defined as 

I(x\M k ) = - log f P{x\6)dit{0) (44) 

Essentially the difference to the minimal description length (MDL) is the inte- 
gration over all possible "priors" or models instead of chosing the one, where the 
product under the integral is minimal. As the integral is larger than the maxi- 
mum of its elements, naturally I(x\Mk) is smaller than the minimal description 
length. Although asymptotically the Stochastic Complexity is identical with the 
criteria above, it is optimal for data sets of any size and has the required invariance 
properties. 

The problem is, that the Stochastic Complexity can't be calculated analytically 
in many cases. Fortunately there is a closed form solution for the regression prob- 
lem. It requires the parametrization of the the prior distribution for the variance 
<7 in the form of the conjugate priors 

3 

7r((7) = tt-2 (±j 2 e-£-<7"t (45) 

and for the parameters as a normal distribution 7r(^) with zero mean and variance 
^, with a and c as nuisance parameters. The Stochastic Complexity is optimized 
with respect to the nuisance parameters. The derivation is described in [Ris89, 
page 126-130], here just the result is reported. The design matrix X = G T G is 
extended to S c = cl + G T G, which prevents it from becoming singular. Then the 
minimum sum of squared deviations also depends on c: 

S c = x T x - x T G(d + G T G)~ 1 G T x (46) 

The complete Stochastic Complexity is 

n 1 

I(*\9,c) = 2 log 2^+ 2 l0g2 ' Sc ' 

~flo & c + ^±ilo g2 ^-ilog ; r(^±i) (47) 

The minimization with respect to c has to be done numerically with the itera- 
tion: 

c = — (48) 

S c trX c + (n + l)x T G^ 2 G T x K } 

It has been proven [Ris90], that the Stochastic Complexity found this way is in- 
variant with respect to scale changes of regressor variables and observations. 
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